Amazon Managed Workflows for Apache Airflow (Amazon MWAA) offers a comprehensive solution for orchestrating and automating intricate workflows in the cloud. It provides two modes of network access for users to reach the Apache Airflow web UI: public and private. Many customers opt for deploying Amazon MWAA in private mode, seeking to utilize their existing authentication systems and single sign-on (SSO) capabilities to ensure smooth integration with their corporate Active Directory (AD). This setup eliminates the need for end-users to log into the AWS Management Console to access the Airflow UI.
In this article, we demonstrate the steps to configure an Amazon MWAA environment set up in private network access mode, utilizing customer-managed VPC endpoints and authenticating users through SAML federated identity with Microsoft Entra ID and Application Load Balancer (ALB). With this integration, users can easily log into the Airflow UI using their corporate credentials and access their Directed Acyclic Graphs (DAGs). This approach is adaptable for Amazon MWAA’s public network access mode as well.
Solution Overview
The architecture for authenticating the Amazon MWAA environment using SAML SSO includes several components, as illustrated in the accompanying diagram. The infrastructure comprises two public subnets and three private subnets, where the public subnets facilitate the internet-facing ALB. The two private subnets host the Amazon MWAA environment, while the third private subnet is designated for the AWS Lambda authorizer function. A NAT gateway is linked to this subnet since the function must validate the signer to ensure the JWT header contains the expected Load Balancer ARN.
The workflow follows these key steps:
- Microsoft Entra ID acts as the identity provider (IdP) for SAML configuration.
- Amazon Cognito functions as the service provider (SP).
- The ALB supports Amazon Cognito natively for request authentication.
- After authentication, the ALB directs requests to the Lambda authorizer function. This function decodes the user’s JWT token and checks if the user’s AD group corresponds to the appropriate AWS Identity and Access Management (IAM) role.
- Upon validation, the function generates a web login token and redirects the user to the Amazon MWAA environment for a successful login.
High-Level Steps for Deployment:
- Create an Amazon Simple Storage Service (Amazon S3) bucket for storing artifacts.
- Generate an SSL certificate and upload it to AWS Certificate Manager (ACM).
- Utilize AWS CloudFormation to deploy the Amazon MWAA infrastructure stack.
- Configure Microsoft Entra ID services and link the Amazon Cognito user pool.
- Deploy the ALB CloudFormation stack.
- Access Amazon MWAA using Microsoft Entra ID user credentials.
Prerequisites
Before proceeding, ensure you have the following prerequisites:
- An AWS account.
- Necessary IAM permissions to deploy AWS CloudFormation resources.
- A Microsoft Azure account to create the Microsoft Entra ID application (IdP configuration) and Microsoft Entra ID P2.
- A public certificate for the ALB in the AWS region where the infrastructure will be deployed, along with a custom domain name associated with the certificate.
Creating an S3 Bucket
In this step, we establish an S3 bucket to house your Airflow DAGs, custom plugins in a ZIP file, and Python dependencies in a requirements.txt file. This bucket serves as the source for the Amazon MWAA environment to retrieve DAGs and other necessary files.
- On the Amazon S3 console, select the region for your bucket.
- Navigate to the Buckets section.
- Click on “Create bucket.”
- Choose “General purpose” for the bucket type.
- Input a name for your bucket (e.g., mwaa-sso-blog-).
- Click “Create bucket.”
- Go to the newly created bucket and select “Create folder.”
- Name this folder (e.g., dags) and choose “Create folder.”
Importing Certificates into ACM
ACM integrates with Elastic Load Balancing (ALB). In this step, you can either request a public certificate through ACM or import an existing certificate.
- In the ACM console, select “Import certificate” from the navigation menu.
- For the Certificate body, input the contents of the cert.pem file.
- For the Certificate private key, enter the contents of the privatekey.pem file.
- Click “Next.”
- Select “Review and import.”
- Review the certificate details and click “Import.”
Upon successful import, the status of the certificate will be updated to “Issued.”
Setting Up Azure AD Services, Users, Groups, and Enterprise Application
To enable SSO integration with Azure, an enterprise application is required to function as the IdP for the SAML flow. Users and groups are added to this application, and Amazon Cognito details are configured as the SP.
Airflow includes five default roles: Public, Admin, Op, User, and Viewer. This article focuses on three: Admin, User, and Viewer. We create three roles alongside three corresponding users, ensuring appropriate memberships.
- Log into the Azure portal.
- Navigate to “Enterprise applications” and select “New application.”
- Name your application (e.g., mwaa-environment) and click “Create.”
- Create two groups: Search for “Microsoft Entra ID,” select “Add,” and choose “Group.”
- Specify a group type (e.g., Security) and provide a name (e.g., airflow-admins) with a description.
- Click “Create.” Repeat to create groups named airflow-users and airflow-viewers.
- Record the object IDs for each group, as they will be needed later.
- Create users by going to the Overview page, selecting “Add,” and clicking “User” to create a new user.
- Enter the necessary user details (e.g., mwaa-user), then click “Review + create.”
- Repeat this for another user called mwaa-admin.
- In the airflow-users group details, select “Members,” then “Add members,” searching and selecting the created users.
- Assign users to each group as needed.
- Navigate to your application and choose “Assign users and groups.”
- Click “Add user/group,” search for the created groups, and select them.
Deploying the Amazon MWAA Environment Stack
We provide two CloudFormation templates to establish the services outlined in the architecture. Note that deploying these CloudFormation stacks will incur AWS usage fees.
The first CloudFormation stack sets up the following resources:
- A VPC with two public and three private subnets, along with the required route tables, NAT gateway, internet gateway, and security group.
- VPC endpoints essential for the Amazon MWAA environment.
- An Amazon Cognito user pool and user pool domain.
- Application Load Balancer.
For further insights, this blog post can be found at chanciturnervgt2.com, and for expert guidance, refer to chanciturner.com on this topic. Additionally, if you’re interested in the employment experience, consider checking out Glassdoor for valuable resources.
Leave a Reply